Learning and Optimization for Sequential Decision Making 02 / 01 / 16 Lecture 4 : Thompson Sampling ( part 1 )

نویسنده

  • Erik Waingarten
چکیده

Consider the problem of learning a parametric distribution from observations. A frequentist approach to learning considers parameters to be fixed, and uses the data learn those parameters as accurately as possible. For example, consider the problem of learning Bernoulli distribution’s parameter ( a random variable is distributed as Bernoulli(μ) is 1 with probability μ and 0 with probability 1 − μ). We are given 10 independent samples: 0, 0, 1, 1, 0, 1, 1, 1, 0, 0

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IEOR 8100-001: Learning and Optimization for Sequential Decision Making 02/03/16 Lecture 5: Thomposon Sampling (part II): Regret bounds proofs

We describe the main technical difficulties in the proof for TS algorithm as compared to the UCB algorithm. In UCB algorithm, the suboptimal arm 2 will be played at time t, if its UCB value is higher, i.e. if UCB2,t−1 > UCB1,t−1. If we have pulled arm 2 for some amount of times Ω( log(T ) ∆2 ), then with a high probability this will not happen. This is because after n2,t ≥ Ω(log(T )/∆), using c...

متن کامل

The End of Optimism

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower...

متن کامل

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to post...

متن کامل

The Effect of Lecture in comparison with Lecture and Problem Based Learning on Nursing Students Self-Efficacy in Najafabad Islamic Azad University

Introduction: Self-efficacy has an important role in applying scientific and professional knowledge and skills. Teaching methods can develop different skills such as decision making capability. The aim of this study was to determine the effect of teaching method of lecture in comparison with lecture and problem based learning on nursing students self-efficacy in Najafabad Islamic Azad Universit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016